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dma 



.if 



cpu 



J cpu L »x switch fabric ^ 

| ethemet^ tg 

V-ehtp-SoC IP 
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Haw Rings assign address space 

Stepl: filigiiixoning address to self (to some pew er of 2) 
Step2: asa^ithe resUtto self address 

StepS: next_add = selfaddr + self_addr_space; // number of regster usediocally 
Step4: send do(wnnext_addr ( 



Example: 

Dmaneeds 16addzs 

Uartreeds4 self =32 
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clkl 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



clk2 



7 



-d IS- fc^ 



£ *f* compound A 



"TO 



£ 

compound B 



clock run * with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
. after layout. 
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lo 



A 



compouna 



7 



data_a 



-A. 



tB<h— 



compound B 



rlnck opposes data 

this arrangement has the advantage 
of auto ensuring the no race 
^condition (at least in this simple 



data_b C ase) exists 



1* 



Qa 



> 

I 








/ 


\ 


i_a _ 




t 


\ 



data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



-70 



P9 
7 



- d 



clka 



QaJ_ 



f 



90 
/ 



7* 



17- 




OK ^ 

compound B 



compound A 
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data (\% 



clock 



clock 




clock 



^'5 
2" 



clka I 1 I 1 I 

data_a /I ~" ""~\ 

Hncertainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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data 





ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
c ock mem bers, we can take care to delay the datas 

beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. Tt will need a latch in "b*\ 



big module 



F'9 



local_clock 0 

local_data_out 



//<r 



data from 
previous member 

elk 




clock 



ring interface 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b", ring works 
fine. However if most of the traffic is from "c" to "b'\ 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b M mostly to "c" In this case 
communication between "b" and V suffers degradation 
in case that peak traffic coincide. 




I5b 



F'3- 13 
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Land bridge gets its name from the fact that it is 
a luxury. Tt spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl . By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to Dl is treated directly. 




Enumeration is started by "Anchor" 
which assigns address=l to itself, results 
of enumeration are labels 1 to 7. land , 
bridge gets two addresses » as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




11 far 



%OC> 



In land bridge ring, the situation is trickier. If V2 
send message to address=5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 



F '3 



11 
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4 

? type/ 1 * 




berA 






§ 




1 


S 


64 data^T^tf 






ok 












scan test 






reset 












clkj?) 





1Q> 



elk 



type 



ok 



idle / msgA 



\ idl e 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message, member A uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



5 



imsg iok 



omsg 00k 




umsg 



dok~ 



uok 
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member A 



imsglok omsg 



member B 

imsgiok oinsg ook 




.IV 



14+ 



The incoming messages are examined first 19 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during ^ - 
enumeration protocol. The lower part of this f comparator^ 
register is always masked.. Hopefully f — * 

synthesis will delete the unused bits I 
implementation. ours/through 




incoming 
address 

address 
split mask 



dont care part 
of self-address 



\ 



part of the address 
that enters the member 



F«5- 2.1 
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Member 



Ri, J » Rjf * Rlf_„_* modu]eJd 



Address Space = 7 



Activation register 



300 
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Member 



Rif_I_options[5:0] 

Rif_l_addr[*:0] 

RifJLdatal/h[31:0] 



t 



RifJ_write 
Rif_I_read 



WULok 



Rif_I_cIock 



RIF 



* = address_space 



Activation register 

\ : 



3 



300 
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Member 




9- 3 ^ 
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Member 



RifJ_* Rif_* Rif_o_* moduleJ d 



RIF 



Address Space = 7 



Activation register 



Fig?. 33 



3oo 
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the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3^ 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

3^? 



Interna] Memory 35? 

Fast, Unified, Multi-port 



Network Processor 



Peripheral Expansion ArcsF 

Enet, ATM, Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 

3C£ 



2>So 
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Core Debug 



DtxufceU 



DMA Agent 



llxtcnu] W<trid 
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31 



VobU Compound* 
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32 registers of 
32 bits ta*k^ 

A ad of indkatintii. 
phcr taik. which 
ciimjul taJc execution ' 

scheduling 

Aft interface to 
adjaceivt resources ** 

Fast i2iirmi4>ry aece^sttd 
by load/gfoM'e 
instructions 



General 
purpose 
register 



interface 



Internal 
tacmwy 



Social 
purpose 



En 



^tntiguratton 



External 
memory 



put task ami 
global registers 



Initialled b^ 
ihcPP 



Big area aceeitswd 
via a DMA jmer&ce 
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Rl r*fksier: 



112 1:12 2 £ i * 1 i i i i i t t i i i * h ' * i + .s a i u 



iilll 



& - sticky bet 

£q - equal/zero 

It - kss thcn-'Vicgaiivc 

jtt - great** 1 ihiMtfpcisiiivc 

c- carry 

nib - nsflertioft of the RAM timta-radcr busy indication. 



> 30 



*l 1 t si 
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Frame structure 
of an example 
task type 



¥3 



r27. 



common 
task data 


task 

fragment I 
data 






ta.sk fragment 
2 data 


task 

fragment 3 
data 


data of all lcvel2 functions 


level 1 fl 
data 


level 1 n 
data 


level 1 ft 
data 







\ sizcoflcvelO 
: frame part is 
\ different for 
j each task 

: type 

A aJ/coflcvcl2 
! frame part is 
^ constant 

$v/c of level I 
frame part is 
different per 
each task 
type 



FBtT| vector 






fa^|~ll>gn | | align | 



J 



mgiator file 



^ data ahj H 



llfTtfT | 



tpactat 
purpose 
rags 



APP_H)=1 0064331 



Page 254 of 281 



J/Q'Q 6 s+3 3 -1 ... -O 702 O 2 




]> - Flip Hop 



3> - Logic 
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£3° 



^ ■ - yield Indication 




41° 



ere amoup stall 
T?in[_agenF stall 
emory itall 
~-n«aer stall 
reder stall 



Jdma 

— tH-agenl 



I 



DMA agent message imt ^ 



message out mo 1 



I 

ring^in 



TspHter 



not our message 
ringif 
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,typc_ in 

_iiiterfacc_aJdrcas[ 1 5.0 
; jntcrfaccjdatii(3 1 :0J 



type out[7:0]<*i2euCRC ) 
address_out[23:0] 



data_out[63:0] 
(a1s< UP C RC) 



(alscup C R 
ok2dnve 



memory data 

F"ata[addQ]blata[add1lbataradd21 Idatafadd3^ata[add41 bataradd^|data[add6^ata[add7l 




| byte[03 j byta£1) | bytettl j bytc[3] fbytet^ Tbytej 



command opcode 1 opttons[5:0J RA IaID[5:0J RB/imm8 f 
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agent c 

(AJD^multiroadA 



vobla entry 
in multireader 



source addrcss[23:0J 
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count(70J 
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ueMinanan t M ^ t — ^ t — ^ - to trausfcr 
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agent Interface 
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entry 
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entry 
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message 
encoder 
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, elk 



ok2 drive 



agent command 



opcode 
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message data address_of_dc«ncnarion message type 




agent command 
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agent command 
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i agent interface [~ 
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